class: center, middle, inverse, title-slide # Lecture 5 ## The Effects Model ### Psych 10 C ### University of California, Irvine ### 04/08/2022 --- ## Null Model - Last class we used mathematical notation and the normal distribution to represent what we called the **Null Model** -- - This model assumed that all `\(i = 1,\dots, 4\)` observed participants of the `\(j=1,2\)` groups where samples from the same distribution. -- - In other words, it formalizes our verbal hypothesis that there are no differences between groups. -- - The **Null Model** is defined as: `$$y_{ij} \sim \text{Normal}(\mu, \sigma^2)$$` -- - Finally, we said that given that we don't know the values of our parameters `\(\mu\)` and `\(\sigma^2\)` which completely define the Normal distribution, we needed to infer (learn) them from our observations. -- - This is called **Statistical Inference**, but how can we actually get those numbers? --- ## Statistical Inference for the Normal distribution - One of the key advantages of the Normal distribution is that its parameters are directly associated with the expectation and the variance of a random variable. -- - Our model assumes that our observations are random variables that follow a Normal distribution with parameters `\(\mu\)` and `\(\sigma^2\)`. -- - If a random variable `\(y\)` follows a Normal distribution with parameters `\(\mu\)` and `\(\sigma^2\)`, then we know that the following two statements are **TRUE**: `$$\mathbb{E}(y) = \mu$$` and `$$\mathbb{V}ar(y) = \sigma^2$$` -- - Remember that we already have a very good approximation to both `\(\mathbb{E}(y)\)` and to the `\(\mathbb{V}ar(y)\)`! --- class: inverse, middle, center # Estimators --- ## Estimators - In week 1 we talked about statistics as functions of our observations. -- - A statistic that is used to approximate the parameter (like `\(\mu\)`) in a statistical model is called an **estimator**. -- - Given that we know that our best statistic for the expected value of a r.v. is the mean or average of our observations, we can use it as an **estimator** for `\(\mu\)`. -- - We denote those estimators by adding a "hat" on top of the Greek character: `$$\hat{\mu} = \frac{1}{n} \sum_{i} \sum_{j} y_{ij}$$` -- - Here, `\(n\)` represents the total number of observations that we added to calculate the mean, and the indices `\(i\)` and `\(j\)` denote the observation number and the group respectively. --- ## Variance estimator - We also had a good approximation for the variance of a random variable in the sample variance `\(s^2\)`, however, we will write it slightly different this time to make other models easy to understand. -- - Our estimator for the variance will be denoted as: `$$\hat{\sigma}^2_0 = \frac{1}{n} \sum_i \sum_j \left(y_{ij}-\hat{\mu}\right)^2$$` -- - If you look at your previous notes, you will see that we replaced the value of the mean `\(\bar{y}\)` with our **estimator** `\(\hat{\mu}\)` and that we changed from `\(s^2\)` to `\(\hat{\sigma}_0^2\)`. -- - This is because "0" indicates that this is an estimator for the variance of the **Null Model**. --- ## The Null Model - Together, we refer to our new estimators `\(\hat{\mu}\)` and `\(\hat{\sigma}_0^2\)` as: -- 1. Model Prediction: `\(\hat{\mu}\)` -- 1. Mean Squared Error: `\(\hat{\sigma}_0^2\)` -- - When using this type of statistical models (like we will most of this class) we will also be interested on the Sum of Squared Errors, which is denoted as: `$$SSE_0 = \sum_i \sum_j \left(y_{ij}-\hat{\mu}\right)^2$$` -- - Again we added a "0" to make it clear that this is the Sum of Squared Errors associated to the **Null Model**. --- - Form teams of 3 and calculate the model prediction, the sum of squared errors and the mean squared error of the **Null Model** for the smokers data:
--- ## Smokers data: Null Model .can-edit.key-likes[ - Prediction: ] .can-edit.key-likes[ - SSE: ] .can-edit.key-likes[ - mean Squared Error: ] --- class: inverse, middle, center # The Effects Model --- ## Effects Model - Now we will formalize our second model. -- - Our original problem stated that we wanted to know if there where any differences in lung capacity between smokers and non-smokers. -- - Our first model was constructed on the idea that there are no differences between groups. -- - We will call the model that assumes that groups are different the **Effects Model**. --- ## Effects Model - Following the logic used to formalize the Null Model, we will use our observations notation `\(y_{ij}\)` and the Normal distribution as the statistical model. -- - In mathematical notation the **Effects Model** is: `$$y_{i1}\sim\text{Normal}(\mu_1,\sigma_e^2)$$` `$$y_{i2}\sim\text{Normal}(\mu_2,\sigma_e^2)$$` -- or `$$y_{ij}\sim\text{Normal}(\mu_j,\sigma_e^2)$$` -- - Notice that both statements convey the same information, however, the second one is shorter. -- - This new statistical model formalizes the idea that the `\(i = 1, \dots, 4\)` observations on each group `\(j = 1, 2\)` are samples of two different Normal distributions, one centered at `\(\mu_1\)` and the other one centered ar `\(\mu_2\)`. --- ## Effects Model - Another important part to notice on the definition of the model is that we assume that both distributions have the same variance denoted as `\(\sigma_e^2\)`. -- - This means that our model will assume that the only difference between the two groups is their expected value and not how much they vary. -- - This will allow us to use all of our observations to calculate a single error estimate of the model. -- - Why is it called the **Effects Model**? -- - The difference between `\(\mu_2\)` and `\(\mu_1\)` can be interpreted as the effect on our random variable that comes from being in group 2 in comparison to being in group 1. -- - In our smoking example, the difference between `\(\mu_2\)` and `\(\mu_1\)` can be interpreted as the effect that smoking has on lung capacity. -- - In other words, how much do we *expect* lung capacity to change if a person smokes in comparison to a person that does not. --- # Graphical Representation Effects Model - As we did with the Null Model, we can also represents the effects model graphically. -- - Notice that we still don't have values for our parameters, so we can make a graph that shows that we expect each group to be represented by a different normal distribution with a different mean but same variance. -- - We can use the following R code to generate that representation: .pull-left[ ```r par(mai = c(1,0.1,0.1,0.1)) curve(dnorm(x, mean = 0, sd = 1), from = -4, to = 6, axes = FALSE, ann = FALSE, col = "red", lwd = 3) curve(dnorm(x, mean = 2, sd = 1), col = "blue", add = T, lty = 3, lwd = 3) box(bty = "l") mtext(text = "Lung capacity", side = 1, line = 2, cex = 1.6) legend("topleft", bty = "n", col = c("blue","red"), legend = c("non-smokers", "smokers"), lty = c(3, 1)) ``` ] .pull-right[ <img src="data:image/png;base64,#lec-5_files/figure-html/normal-effect-out-1.png" style="display: block; margin: auto;" /> ] --- ## Graphical Representation - Notice that in the previous plot we do not have values assigned to the variable lung capacity. -- - This is because we only want to show that the model is assuming that the groups are samples from two different normal distributions. -- - However, we can (and should) formalize our Theories about the world before even looking at the data. -- - In other words, we will always have a specification of our models even before running an experiment (so we can't estimate the values of our parameters). --- ## Estimators for the Effects model - Now that we have specified our new model (that assumes that the groups are different), we have 3 parameters for which we want to find appropriate estimators. -- - Our parameters are `\(\mu_1\)`, `\(\mu_2\)` and `\(\sigma_e^2\)`. Again, our estimators will be a statistic (a function of the sample) and we will denote them with a hat. -- `$$\hat{\mu_1} = \frac{1}{n_1} \sum_i y_{i1}$$` -- `$$\hat{\mu_2} = \frac{1}{n_2} \sum_i y_{i2}$$` -- and finally, `$$\hat{\sigma}_e^2 = \frac{1}{n} \sum_j \sum_i (y_{ij} - \hat{\mu}_j)^2$$` - Where `\(n\)` represents the total number of observations, and `\(n_j\)` represents the number of observations in group `\(j = 1, 2\)` --- ## Estimators for the Effects Model - Notice that our estimators for `\(\mu_1\)` and `\(\mu_2\)` are just the mean of our observations by group! -- - Again, we can interpret `\(\hat{\mu}_j\)` as the prediction of the model for each observation `\(i = 1, \dots, 4\)` of group `\(j = 1, 2\)`. -- - Then we can interpret our estimator for `\(\hat{\sigma}_e^2\)` as the average error in the predictions of the model. In other words, how far on average are our observations from our best prediction `\(\hat{\mu}_j\)` of the group. -- - As we did with the null model, we will be interested again in the total error of the model, also known as the Sum of Squared Error of the **Effects Model**. -- - We will denote the Sum of Squared Error of the **Effects Model** as: `$$SSE_e = \sum_j \sum_i (y_{ij} - \hat{\mu}_j)^2$$` --- - Form teams of 3 and calculate the model predictions for each group, the sum of squared errors and the mean squared error of the **Effects Model** for the smokers data:
--- ## Smokers data: Effects Model .can-edit.key-likes[ - Prediction: ] .can-edit.key-likes[ - SSE: ] .can-edit.key-likes[ - mean Squared Error: ]